home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Language/OS - Multiplatform Resource Library
/
LANGUAGE OS.iso
/
dsp
/
fft
/
fft_eyal.lha
/
fft_eyal
/
readme
< prev
next >
Wrap
Text File
|
1991-09-01
|
6KB
|
141 lines
This is the second release of these programs. The first release (packaged
by Udi Finkelstein) included only the m68k programs.
This is my implementation of integer real fft computation. It was
designed for high speed analysis of audio input in real time on a
PC/AT. It has support for generic C and optimized assembly for:
- intel 80x86
- motorola 68k
- natsemi 32k
The assembler is usualy 4-8 times faster than the generic C so it is worth
using.
The data used was 8bit but the routine does 16bit signed arithmetic.
The algorithm of choice was one which minimizes mults while keeping the
number of needed registers to a minimum (the Intel chip does not have
enough). Data if gradualy scaled in flight to avoid overflow.
A standard (2,4) split-radix was used as described in [Sorensen et al.,
IEEE ASSP-35 no 6., June 87, pp. 849-863]. Other algoriths (WFT etc.)
where found to use too complex arithmetic for efficient implementation
on a machine with a decent mult instruction because register spills slow
it more than the reduce mults speed it.
WHAT IT DOES NOT DO:
- NO inverse fft.
- NO complex input.
- NO windowing.
- output is power spectrum WITHOUT the square root taken.
- the sample data array (x[]) is not descramled so you cannot use it directly.
see how the power spectrum is derived if you want to generate phase or
whatever. fft7() and fft8() function do this post-processing.
The method of implementaion was to have a program (fftg.c) that
generates the fft routine which is then compiled/assembled and used.
The generated program is assembly, but for portability this release
generates a C program too. I have routines for Intel assembly and ns32k
assembly, other machines can have the basic routines re-written without
much effort (but for best performance one would optimize these routines
extensively).
The archive contains a sample program (fft2.c) that uses the fft
routine. It draws some basic picture on a screen using move() and
draw() function which you will have to supply (it uses the microsoft c
graphics library in hercules mode using msherc.com). To run the test
program: 'fft2a' or 'fft2c' or 'fft2f'. you can try 'fft2a x' which
will call fft 10000 times and display the time, then do 'fft2a x x' to
measure the overhead of the loop, then a result of (e.g.) 29 from the
first and 3 from the second means an average time of 2.6ms/fft. When
the program asks for a data file respond 'd1' of 'd2' etc. or 'end' to
stop.
Originaly the program was a BASIC program from some British mag a few
years ago which started my interest; it took a few minutes to do one
fft. The current version does a 256 point fft on a 25MHz ns32532 in
2.6ms(asm)/4.2ms(c)/5.8ms(funcs), on a 10MHz 80286 it takes 7ms(asm, it
has a rather fast int mult)/38.8ms(c)/44.4ms(funcs). As you see, gcc on
the ns32k compares better to hadcrafted assembly than msc5.1.
I now use a 20MHz 80286. If you sample at 44Khz, 256 points take 5.8ms
to sample but only 2.9ms to do the fft (assuming the sampling is done
via DMA, inerrupts would eat too much cpu time), so you have 2.9ms for
other stuff (display? this is slow unless you really try. I wrote my
low level display routines or the fft gain would be lost). On a 386/486
you are laughing.
The package contains one ready fft() in the file fft8c.f; it does a 256
point fft in the slowest way but it should compile readily.
A program to measure the time is also supplied as fft1.c. You will need to
link the following:
fft1.o fft8f.o fftsubs.o
then run the result to see how fast it goes. Be patient, the test takes about
one minute. On some machine (where 'mult' has variable time) this test will
give only the typical speed.
To get the generator programs 'make progs'. A generator program is built
from two modules: fft.g (the core) and fftout?.c (the output driver). You
should choose one of the drivers or write your own if there is no support
for your machine (in which case I would like to hear from you). The driver
fftoutf.c generates a C program that calls functions in fftsubs.c; this
should work on every system.
Once you have the generator, you need to run it and generate the fft program:
fftgf fft8f 8 fft
this will build a 256 (2^8) points fft() function into fft8f.c. Now compile
and then link into your application.
All of the above is done in the makefile, look at it.
To get the fft programs 'make fft'. The 'c' version uses fftsubs.h. The
'f' version uses C function calls which is a bit slower but uses far
less memory; it uses fftsubs.c. The '86' version uses fftsubs.mac, Note
that the makefile generates a 256 point routine with entry point fft(),
you may want to change it for your needs or just edit the output.
fftouts.c generates ns32k assembly, fftouta.c generates Intel assembly.
These should give a good idea of how to port the assembly level
routines to another machine.
To get the test programs 'make tests' (but first fix fft2.c for the
graphics if needed).
MSDOS NOTE: The c output has to be split into smaller files because a)
the compiler wont optimize a large prog b) the compiler dies on any
larger progs, and c) the code segment would be too large. The output
routines in fftoutc.c will break the output every 1000 statements. This
is fine for msc5.1, other compilers may cope with larger programs, so
just up the limit in break_fft().
HOW TO MODIFY FFT:
The program to modify is fftg.c and the related fftout?.c that you use.
You would probably modify the latter.
In general, the calls to fft1...fft5 handle the fft proper while
fft7+fft8 do the final power spectrum and reordering. The last two is
where you extract the results you wanted from x[] into qf[]. x[] holds
the real and complex parts in the order:
xx[0-255]= R[0],...,R[128],I[127],...,I[1]
but in this program xx[i] is in position x[mm[i]].
HOW TO CALL FFT:
In your program header include:
short near x[256] = {0}; /* some compilers need to initialize */
short near qf[128+1] = {0};
extern far fft (); /* or whatever you called it */
Then in the program:
/* read the data points into x[0-255]*/
fft ();
/* the power spectrum is in qf[0-128], qf[0] is the DC componnent */
Thats all.
Please let me know of problems/difficulties you encounter with these
programs as well as any suggestions/improvements.
Regards
Eyal Lebedinsky eyal@ise.canberra.edu.au